智能论文笔记

IIsy: Practical In-Network Classification

Changgang Zheng , Zhaoqi Xiong , Thanh T Bui , Siim Kaupmees , Riyad Bensoussane , Antoine Bernabeu , Shay Vargaftik , Yaniv Ben-Itzhak , Noa Zilberman

分类：机器学习

2022-05-17

目前，数据赢得了用户生成的数据和数据处理系统之间的大鼠竞赛。机器学习的使用增加导致处理需求的进一步增加，而数据量不断增长。为了赢得比赛，需要将机器学习应用于通过网络的数据。数据的网络分类可以减少服务器上的负载，减少响应时间并提高可伸缩性。在本文中，我们使用现成的网络设备以混合方式介绍了IISY，以混合方式实施机器学习分类模型。 IISY针对网络内分类的三个主要挑战：（i）将分类模型映射到网络设备（ii）提取所需功能以及（iii）解决资源和功能约束。 IISY支持一系列传统和集合机器学习模型，独立于开关管道中的阶段数量扩展。此外，我们证明了IISY用于混合分类的使用，其中在一个开关上实现了一个小模型，在后端的大型模型上实现了一个小模型，从而实现了接近最佳的分类结果，同时大大降低了服务器上的延迟和负载。

translated by 谷歌翻译

Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning

Thanh Le-Cong , Duc-Minh Luong , Xuan Bach D. Le , David Lo , Nhat-Hoa Tran , Bui Quang-Huy , Quyet-Thang Huynh

分类：机器学习

2023-01-03

In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.

translated by 谷歌翻译

Using Artificial Intelligence and IoT for Constructing a Smart Trash Bin

Khang Nhut Lam , Nguyen Hoang Huynh , Nguyen Bao Ngoc , To Thi Huynh Nhu , Nguyen Thanh Thao , Pham Hoang Hao , Vo Van Kiet , Bui Xuan Huynh , Jugal Kalita

分类：计算机视觉

2022-08-12

本文报道的研究通过应用计算机视觉技术将普通的垃圾桶转化为更聪明的垃圾箱。在传感器和执行器设备的支持下，垃圾桶可以自动对垃圾进行分类。特别是，垃圾箱上的摄像头拍摄垃圾的照片，然后进行中央处理单元分析，并决定将垃圾桶放入哪个垃圾箱中。我们的垃圾箱系统的准确性达到90％。此外，我们的模型已连接到Internet，以更新垃圾箱状态以进行进一步管理。开发了用于管理垃圾箱的移动应用程序。

translated by 谷歌翻译

Ergo, SMIRK is Safe: A Safety Case for a Machine Learning Component in a Pedestrian Automatic Emergency Brake System

Markus Borg , Jens Henriksson , Kasper Socha , Olof Lennartsson , Elias Sonnsjö Lönegren , Thanh Bui , Piotr Tomaszewski , Sankar Raman Sathyamoorthy , Sebastian Brink , Mahshid Helali Moghadam

分类：机器学习

2022-04-16

关键应用程序中机器学习（ML）组件的集成引入了软件认证和验证的新挑战。正在开发新的安全标准和技术准则，以支持基于ML的系统的安全性，例如ISO 21448 SOTIF用于汽车域名，并保证机器学习用于自主系统（AMLAS）框架。 SOTIF和AMLA提供了高级指导，但对于每个特定情况，必须将细节凿出来。我们启动了一个研究项目，目的是证明开放汽车系统中ML组件的完整安全案例。本文报告说，Smikk的安全保证合作是由行业级别的行业合作的，这是一个基于ML的行人自动紧急制动示威者，在行业级模拟器中运行。我们演示了AMLA在伪装上的应用，以在简约的操作设计域中，即，我们为其基于ML的集成组件共享一个完整的安全案例。最后，我们报告了经验教训，并在开源许可下为研究界重新使用的开源许可提供了傻笑和安全案例。

translated by 谷歌翻译

VSEC: Transformer-based Model for Vietnamese Spelling Correction

Dinh-Truong Do , Ha Thanh Nguyen , Thang Ngoc Bui , Dinh Hieu Vo

分类：自然语言处理

2021-11-01

拼写错误纠正是自然语言处理中具有很长历史的主题之一。虽然以前的研究取得了显着的结果，但仍然存在挑战。在越南语中，任务的最先进的方法从其相邻音节中介绍了一个音节的上下文。然而，该方法的准确性可能是不令人满意的，因为如果模型可能会失去上下文，如果两个（或更多）拼写错误彼此静置。在本文中，我们提出了一种纠正越南拼写错误的新方法。我们使用深入学习模型解决错误错误和拼写错误错误的问题。特别地，嵌入层由字节对编码技术提供支持。基于变压器架构的序列模型的序列使我们的方法与上一个问题不同于同一问题的方法。在实验中，我们用大型合成数据集训练模型，这是随机引入的拼写错误。我们使用现实数据集测试所提出的方法的性能。此数据集包含11,202个以9,341不同的越南句子中的人造拼写错误。实验结果表明，我们的方法达到了令人鼓舞的表现，检测到86.8％的误差，81.5％纠正，分别提高了最先进的方法5.6％和2.2％。

translated by 谷歌翻译

Concept Drift Monitoring and Diagnostics of Supervised Learning Models via Score Vectors

Kungang Zhang , Anh T. Bui , Daniel W. Apley

分类： (统计)机器学习 | 机器学习

2020-12-12

监督学习模型是最基本的模型类别之一。从概率的角度查看监督的学习，通常假定拟合模型的一组培训数据遵循固定分布。但是，这种平稳性假设通常在称为概念漂移的现象中违反，该现象是指随时间变化的变化，在协变量$ \ mathbf {x} $和响应变量$ y $之间的预测关系中，并且可以渲染受过训练的模型次优或过时。我们开发了一个全面且在计算上有效的框架，用于检测，监视和诊断概念漂移。具体而言，我们使用多变量指数加权移动平均值的形式来监视拟合模型的对数似然梯度的梯度，该形式可以监视随机矢量平均值的一般变化。尽管我们在基于流行的错误方法上证明了具有实质性的性能优势，但以前尚未考虑基于分数的方法进行概念漂移监测。提出的基于分数的框架的优点包括适用于任何参数模型，对理论和实验中所示的更大变化的检测以及固有的诊断功能，以帮助识别变化的性质。

translated by 谷歌翻译

Flexible Supervised Autonomy for Exploration in Subterranean Environments

Harel Biggie , Eugene R. Rush , Danny G. Riley , Shakeeb Ahmad , Michael T. Ohradzansky , Kyle Harlow , Michael J. Miles , Daniel Torres , Steve McGuire , Eric W. Frew

分类：机器人

2023-01-02

While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.

translated by 谷歌翻译

Muse: Text-To-Image Generation via Masked Generative Transformers

Huiwen Chang , Han Zhang , Jarred Barber , AJ Maschinot , Jose Lezama , Lu Jiang , Ming-Hsuan Yang , Kevin Murphy , William T. Freeman , Michael Rubinstein

分类：计算机视觉 | 人工智能 | 机器学习

2023-01-02

We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io

translated by 谷歌翻译

Spectral Bandwidth Recovery of Optical Coherence Tomography Images using Deep Learning

Timothy T. Yu , Da Ma , Jayden Cole , Myeong Jin Ju , Mirza F. Beg , Marinko V. Sarunic

分类：人工智能 | 计算机视觉

2023-01-02

Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture.

translated by 谷歌翻译

An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal Effects

Thanh Vinh Vo , Arnab Bhattacharyya , Young Lee , Tze-Yun Leong

分类：机器学习 | 人工智能 | (统计)机器学习

2023-01-01

We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources in a federated setting. We introduce an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. The data sources may have different distributions; the causal effects are independently and systematically incorporated. The proposed method estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. The heterogeneous causal effects can be estimated with no sharing of the raw training data among the sources, thus minimizing the risk of privacy leak. We also provide minimax lower bounds to assess the quality of the parameters learned from the disparate sources. The proposed method is empirically shown to outperform the baselines on decentralized data sources with dissimilar distributions.

translated by 谷歌翻译